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Abstract: In this paper we examine the implications of the statistical large 
sample theory for the computational complexity of Bayesian and quasi- 
Bayesian estimation carried out using Metropolis random walks. Our anal- 
ysis is motivated by the Laplace-Bernstein- Von Mises central limit theo- 
rem, which states that in large samples the posterior or quasi-posterior 
approaches a normal density. Using the conditions required for the central 
limit theorem to hold, we establish polynomial bounds on the computa- 
tional complexity of general Metropolis random walks methods in large 
samples. Our analysis covers cases where the underlying log-likelihood or 
cxtrcmum criterion function is possibly non-concave, discontinuous, and 
with increasing parameter dimension. However, the central limit theorem 
restricts the deviations from continuity and log-concavity of the log-likelihood 
or extremum criterion function in a very specific manner. 

Under minimal assumptions required for the central limit theorem to 
hold under the increasing parameter dimension, we show that the Metropo- 
lis algorithm is theoretically efficient even for the canonical Gaussian walk 
which is studied in detail. Specifically, we show that the running time of 
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the algorithm in large samples is bounded in probability by a polynomial 
in the parameter dimension d, and, in particular, is of stochastic order d 2 
in the leading cases after the burn-in period. We then give applications 
to exponential families, curved exponential families, and Z-estimation of 
increasing dimension. 

AMS 2000 subject classifications: Primary, 65C05; secondary 65C60. 
Keywords and phrases: Markov Chain Monte Carlo, Computational 
Complexity, Bayesian, Increasing Dimension. 

1. Introduction 

Markov Chain Monte Carlo (MCMC) algorithms have dramatically increased 
the use of Bayesian and quasi-Bayesian methods for practical estimation and 
inference. (See e.g. books of Casella and Robert [5], Chib [H], Geweke [IS], Liu 
[35] for detailed treatments of the MCMC methods and their applications in var- 
ious areas of statistics, econometrics, and biometrics.) Bayesian methods rely 
on a likelihood formulation, while quasi-Bayesian methods replace the likelihood 
with other criterion functions. This paper studies the computational complexity 
of MCMC algorithms (based on Metropolis random walks) as both the sample 
and parameter dimensions grow to infinity at the appropriate rates. The paper 
shows how and when the large sample asymptotics places sufficient restrictions 
on the likelihood and criterion functions that guarantee the efficient - that is, 
polynomial time - computational complexity of these algorithms. These results 
suggest that at least in large samples, Bayesian and quasi-Bayesian estimators 
can be computationally efficient alternatives to maximum likelihood and ex- 
tremum estimators, most of all in cases where likelihoods and criterion functions 
are non-concave and possibly non-smooth in the parameters of interest. 

To motivate our analysis, let us consider the Z-estimation problem, which 
is a basic method for estimating various kinds of structural models, especially 
in biometrics and econometrics. The idea behind this approach is to maximize 
some criterion function: 

2 

, eee<zB. d , (i.i) 

where t/j is a vector of random variables, and m(Ui,9) is a vector of functions 
such that E[m(Ui,8)] — at the true parameter value 9 = 9q. For example, 
in estimation of conditional a-quantile models with censoring and endogeneity, 
the functions take the form 

m(Ui,6) = W(a/pi(6) - lpl < X^Z,. (1.2) 

Here C/j = (Yi, Xi, Zi), Yi is the response variable, X{ is a vector of regressors; 
in the censored regression models, Zi is the same as Xi, and Pi(9) is a weighting 
function that depends on the probability of censoring that depends on Xi and 
9 (see [51] for extensive motivation and details), and in the endogenous models, 



(0) 
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Zi is a vector of instrumental variables that affect the outcome variable Yi only 
through Xi (see [TT] for motivation and details), while Pi(0) = 1 for each i; the 
matrix W is some positive definite weighting matrix. Finally, the index a G (0, 1) 
is the quantile index, and X[6 is the model for the a-th quantile function of the 
outcome Yi. 

In these quantile examples, the criterion function Q n (0) is highly discontin- 
uous and non-concave, implying that the argmax estimator may be difficult or 
impossible to obtain. Figure [1] in Section [5] illustrates this example and simi- 
lar examples where the argmax computation is intractable, at least when the 
parameter dimension d is high. In typical applications, the parameter dimen- 
sion d is indeed high in relation to the sample size (see e.g. Koenker [32] for a 
relevant survey). Similar issues can also arise in M- estimation problems, where 
the extremum criterion function takes the form, Q n (9) = X)"=i m (Ui, 8), where 
Ui is a vector of random variables, and m(Ui,9) is a real-valued function, for 
example, the log-likelihood function of Ui or some other pseudo-log-likelihood 
function. Section [5] discusses several examples of this kind. 

As an alternative to argmax estimation in both the Z- and M-estimation 
frameworks, consider the quasi-Bayesian estimator obtained by integration in 
place of optimization: 



This estimator may be recognized as a quasi-posterior mean of the quasi-posterior 
density ir n (9) cx expQ„(0). (Of course, when Q n is a log-likelihood, the term 
"quasi" becomes redundant.) This estimator is not affected by local discontinu- 
ities and non-concavities and is often much easier to compute in practice than 
the argmax estimator, particulary in the high-dimensional setting; see, for ex- 
ample, the discussion in Liu, Tian, and Wei [34] and Chernozhukov and Hong 



At this point, it is worth emphasizing that we will formally capture the high 
parameter dimension by using the framework of Huber [23] , Portnoy [42] , and 
others. In this framework, we have a sequence of models (rather than a fixed 
model) where the parameter dimension grows as the sample size grows, namely, 
d — ► oo as n — > oo, and we will carry out all of our analysis in this framework. 

This paper will show that if the sample size n grows to infinity and the 
dimension of the problem d does not grow too quickly relative to the sample 
size, the quasi-posterior 




(1.3) 



cxp{Q n (0)} 




(1.4) 



will be approximately normal. This result in turn leads to the main claim: the 
estimator (jl.3l) can be computed using Markov Chain Monte Carlo in polynomial 
time, provided that the starting point is drawn from the approximate support 
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of the quasi-posterior ()1.4|) . As is standard in the literature, we measure run- 
ning time in the number of evaluations of the numerator of the quasi-posterior 
function (|1.4[) since this accounts for most of the computational burden. 

In other words, when the central limit theorem (CLT) for the quasi-posterior 
holds, the estimator (|1.3|) is computationally tractable. The reason is that the 
CLT, in addition to implying the approximate normality and attractive estima- 
tion properties of the estimator 9, bounds non-concavities and discontinuities of 
Q n (9) in a specific manner that implies that the computational time is polyno- 
mial in the parameter dimension d. In particular, in the leading cases the bound 
on the running time of the algorithm after the so-called burn-in period is O p (d 2 ). 
Thus, our main insight is to bring the structure implied by the CLT into the 
computational complexity analysis of the MCMC algorithm for computation of 
(|1.3[) and sampling from (|1.4p . 

Our analysis of computational complexity builds on several fundamental pa- 
pers studying the computational complexity of Metropolis procedures, especially 
Applegate and Kannan [2], Frieze, Kannan and Poison [16], Poison [41], Kan- 
nan, Lovasz and Simonovits |29j . Kannan and Li [28] . Lovasz and Simonovits 
[3"T] , and Lovasz and Vempala [35] I3S1 HO] ■ Many of our results and proofs rely 
upon and extend the mathematical tools previously developed in these works. 
We extend the complexity analysis of the previous literature, which has focused 
on the case of an arbitrary concave log-likelihood function, to the nonconcave 
and nonsmooth cases. The motivation is that, from a statistical point of view, 
in concave settings it is typically easier to compute a maximum likelihood or 
extremum estimate than a Bayesian or quasi-Bayesian estimate, so the latter 
do not necessarily have practical appeal. In contrast, when the log-likelihood 
or quasi-likclihood is either nonsmooth, nonconcave, or both, Bayesian and 
quasi-Bayesian estimates defined by integration are relatively attractive com- 
putationally, compared to maximum likelihood or extremum estimators defined 
by optimization. 

Our analysis relies on statistical large sample theory. We invoke limit the- 
orems for posteriors and quasi-posteriors for large samples as n — > oo. These 
theorems are necessary to support our principal task - the analysis of the com- 
putational complexity under the restrictions of the CLT. As a preliminary step 
of our computational analysis, we state a CLT for quasi-posteriors and posteriors 
under parameters of increasing dimension, which extends the CLT previously 
derived in the literature for posteriors and quasi-posteriors for fixed dimensions. 
In particular, Laplace c. 1809, Blackwell [7], Bickel and Yahav [6], Ibragimov 
and Hasminskii [24], and Bunke and Milhaud [8] provided CLTs for posteri- 
ors. Blackwell [7], Liu, Tian, and Wei [34], and Chernozhukov and Hong [11] 
provided CLTs for quasi-posteriors formed using various non-likelihood crite- 
rion functions. In contrast to these previous results, we allow for increasing 
dimensions. Ghosal [20 previously derived a CLT for posteriors with increasing 
dimension for log-concave exponential families. We go beyond this canonical 
setup and establish the CLT for the non-log-concave and discontinuous cases. 
We also allow for general criterion functions to replace likelihood functions. 
This paper also illustrates the plausibility of the approach using exponential 
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families, curved exponential families, and Z-estimation problems. The curved 
families arise for example when the data must satisfy additional moment re- 
strictions, as e.g. in Hansen and Singleton [21], Chamberlain [It)] , and Imbens 
[25] , Both the curved exponential families and Z-estimation problems typically 
fall outside the log-concave framework. 

The rest of the paper is organized as follows. In Section (2] we establish a gen- 
eralized version of the Central Limit Theorem for Bayesian and quasi-Bayesian 
estimators. This result may be seen as a generalization of the classical Bernstein- 
Von-Mises theorem, in that it allows the parameter dimension to grow as the 
sample size grows. In Section[2l we also formulate the main problem, which is to 
characterize the complexity of MCMC sampling and integration as a function of 
the key parameters that describe the deviations of the quasi-posterior from the 
normal density. Section [3] explores the structure set forth in Section [5] to find 
bounds on conductance and mixing time of the MCMC algorithm. Section [4] 
derives bounds on the integration time of the standard MCMC algorithm. Sec- 
tion [5] considers an application to a broad class of curved exponential families 
and Z-estimation problems, which have possibly non-concave and discontinuous 
criterion functions, and verifies that our results apply to this class of statistical 
models. Section [5] also verifies that the high-level conditions of Section [2] follow 
from the primitive conditions for these models. 

Comment 1.1 (Notations.) Throughout the paper, we follow the framework 
of high dimensional parameters introduced in Huber (1973). In this framework 
the parameter 9^ of the model, the parameter space 9", its dimension d^ n \ 
and all other properties of the model itself are indexed by the sample size n, and 
d^ n > — ► oo as n oo. However, following Huber' s convention, we will omit the 
index and write, for example, 9, 8, and d as abbreviations for 9^ n \ Q^ n \ and 
d^ n \ and so on. 

2. The Setup and The Problem 

Our analysis is motivated by the problems of estimation and inference in large 
samples under high dimension. We consider a "reduced-form" setup formulated 
in terms of parameters that characterize local deviations from the true parame- 
ter value. The local parameter A describes contiguous deviations from the true 
parameter shifted by a first order approximation to an extremum estimator 9. 
That is, for 9 denoting a parameter vector, 9o the true value, and s = y/n(9 — 9 ) 
the normalized first order approximation of the extremum estimator, we define 
the local parameter A as 

A = y/n(6 - O ) - s. 

The parameter space for 9 is 0, and the parameter space for A is therefore 
A = V^(© - 0o ) - s. 

The corresponding localized likelihood or localized criterion function is de- 
noted by -^(A). For example, suppose L n {6^) is the original likelihood function in 
the likelihood framework or, more generally, L n (6) is exp{Q n (6)} where Q n (@) 
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is the criterion function in extremum framework, then 

*(A) = L n (9 + (A + s)/V^)/L n (e + a/y/n). 

The assumptions below will be stated directly in terms of £{X). In Section 03 
we further illustrate the connection between the localized set-up and the non- 
localized set-ups and provide more primitive conditions within the exponential 
family, curved exponential family, and Z-estimation framework. 

Then, the posterior or quasi-posterior density for A takes the form, implicitly 
indexed by the sample size n, 

and we impose conditions that force the posterior to satisfy a CLT in the sense 
of approaching the normal density 

(f>{\) = tt^ exp [ -~A' JA ) . (2.6) 

^ ' (2 7 r) d / 2 det(J- 1 ) 1/2 V 2 ; V ' 

More formally, the following conditions are assumed to hold for £(A) as the 
sample size and parameter dimension grow to infinity: 

n — > oo and d — ► oo . 

We call these conditions the "CLT conditions" : 

C.l The local parameter A belongs to the local parameter space Acl d . The 
vector s is a zero mean vector with variance fi, whose eigenvalues are 
bounded above as ti — > oo, and A = K U K c , where K is a closed ball 
5(0, ||Jr||) such that J K f(X)d\ > 1 - o p (l) and J K 0(A)dA > 1 - o(l). 

C.2 The lower semi-continuous posterior or quasi-posterior function £(X) ap- 
proaches a quadratic form in logs, uniformly in K , i.e., there exist positive 
approximation errors e\ and 62 such that for every A G K, 



In HA) j -^A'JA 



<ei+e 2 -A'JA/2, (2.7) 



where J is a symmetric positive definite matrix with eigenvalues bounded 
away from zero and from above uniformly in the sample n. Also, we denote 
the ellipsoidal norm induced by J as \\v\\j := || J 1 / 2 ^. 
C.3 The approximation errors e\ and e 2 satisfy ej = o p (l), and e 2 • ||i^||j = 
o p (l). 

Comment 2.1 We choose the support set K — B(0, \\K\\), which is a ball of 
radius \\K\\ — sup AeA - ||A||, as follows. Under increasing dimension, the nor- 
mal density is subject to a concentration of measure, namely that selecting 
1 1 AT 1 1 > C ■ yd, for a sufficiently large constant C, is enough to contain the 
support of the standard normal vector. Indeed, let Z ~ N(0,Id), then Pr(Z £ 
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Fig 1. This figure illustrates how ln£(X) can deviate from lna(A), allowing for possible dis- 
continuities in ln£(A). 



K) = Pr(\\Z\\ 2 > C 2 d) -> for C > 1 as d -> oo, because \\Z\\ 2 /d -> p 1. 
For the case where W ~ iV(0, J -1 ) = J~ 1 /2 Z, w e have that Pr(W K) < 
Pr{\\Z\\/v%~> \\K\\) -> for \\K \\ > C^d/X min for C > 1 as d -> oo, where 
-^min denotes the smallest eigenvalue of J. Moreover, since \\K\\j = A max ||if|| ; 
where A max denotes the largest eigenvalue of J, we need to have that \\K\\j > 
\J rfA max /A m ; n . In view of condition C.3, this requires 62d\ 

max/ ^min 

= o p (l) and 

hence e 2 d = o p (l). Thus, in some of the computations presented below, we will 

set 

H-K1I = Cy/d/ A min and \\K\\j = CyJ d\ max / A mi „ for C > 1. 

Finally, even though we make the assumption of bounded eigenvalues of J, we 
will emphasize the dependence on the eigenvalues in most proofs and formal 
statements. This will allow us to see immediately the impact of changing this 
assumption. 

These conditions imply that 

£(A) = o(A) • m(A) 
over the approximate support set K , where 

lna(A) = -~A'JA, (2.8) 

- ei - e 2 A'JA/2 < lnm(A) < e x + e 2 A'JA/2. (2.9) 

Figure [1] illustrates the kinds of deviations of ln£(A) from the quadratic curve 
captured by the parameters e% and e 2 , and also shows the types of discontinuities 
and non-convexities permitted in our framework. Parameter t\ controls the size 
of local discontinuities and parameter e 2 controls the global tilting away from 
the quadratic shape of the normal log-density. 
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Theorem 1 (Generalized CLT for Quasi-Posteriors) Under conditions C.l- 
C.3, the quasi-posterior density \2.5}) approaches the normal density 12. 6]) in the 
following sense: 



Theorem [T] is a simple preliminary result. However, the result is essential 
for defining the environment in which the main results of this paper - the 
computational complexity results - will be developed. The theorem shows that 
in large samples, provided that some regularity conditions hold, Bayesian and 
quasi-Bayesian inference have good large sample properties. The main part of 
the paper, namely Section[3j develops the computational implications of the CLT 
conditions. In particular, Section [3] shows that polynomial time computing of 
Bayesian and quasi-Bayesian estimators by MCMC is in fact implied by the CLT 
conditions. Therefore, the CLT conditions are essential for both good statistical 
properties of the posterior or quasi-posterior under increasing dimension, as 
shown in Theorem[ll and for good computational properties as shown in Section 



By allowing increasing dimension (d — > oo) Theorem Q] extends the CLT 
previously derived in the literature for posteriors in the likelihood framework 
(Blackwell 0, Bickel and Yahav [6], Ibragimov and Hasminskii [24], Bunkc 
and Milhaud [8j, Ghosal [20], Shen [46]) and for quasi-posteriors in the gen- 
eral extremum framework, when the likelihood is replaced by general criterion 
functions (Blackwell [7 , Liu, Tian, and Wei [31] and Chernozhukov and Hong 
[TT]). The theorem also extends the results in Ghosal [2U], who also considered 
increasing dimensions but focused his analysis to the exponential likelihood fam- 
ily framework. In contrast, Theorem [1] allows for non-exponential families and 
for quasi-posteriors in place of posteriors. Recall that quasi-posteriors result 
from using quasi-likelihoods and other criterion functions in place of the like- 
lihood. This substantially expands the scope of the applications of the result. 
Importantly, Theorem [T] allows for non-smoothness and even discontinuities in 
the likelihood and criterion functions, which are pertinent in a number of ap- 
plications listed in the introduction. 

The Problem of the Paper. Our problem is to characterize the complexity 
of obtaining draws from /(A) and of Monte Carlo integration for computing 



where /(A) is restricted to the approximate support K. The procedure used 
to obtain the basic draws as well as to carry out Monte Carlo integration is a 
Metropolis random walk, which is a standard MCMC algorithm used in practice. 
The tasks are thus: 

I. Characterize the complexity of sampling from /(A) as a function of {d, n, ex, €2, 
II. Characterize the complexity of calculating J g(X)f(X)dX as a function of 
(d,n,e 1 ,e 2 ,K); 
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III. Characterize the complexity of sampling from /(A) and performing inte- 
grations with /(A) in large samples as d, n — > oo by invoking the bounds 
on (d, n, ei, 62, K) imposed by the CLT; 
IV. Verify that the CLT conditions are applicable in a variety of statistical 
problems. 

This paper formulates and solves this problem. Thus, the paper brings the CLT 
restrictions into the complexity analysis and develops complexity bounds for 
sampling and integrating from /(A) under these restrictions. These CLT restric- 
tions, arising from the use of large sample theory and the imposition of certain 
regularity conditions, limit the behavior of /(A) over the approximate support 
set K in a specific manner that allows us to establish polynomial computing 
time for sampling and integration. Because the conditions for the CLT do not 
provide strong restrictions on the tail behavior of /(A) outside K other than 
C.l, our analysis of complexity is limited entirely to the approximate support 
set K defined in C.1-C.3. 

By solving the above problem, this paper contributes to the recent literature 
on the computational complexity of Metropolis procedures. Early work was pri- 
marily concerned with the question of approximating the volume of high dimen- 
sional convex sets where uniform densities play a fundamental role (Lovasz and 
Simonovits [37], Kannan, Lovasz and Simonovits [29, 30 ). Later, the approach 
was generalized for the cases where the log-likelihood is concave (Frieze, Kannan 
and Poison [16], Poison [41], and Lovasz and Vempala [38] [39l HQ] ) • However, 
under log-concavity the maximum likelihood or extremum estimators are usu- 
ally preferred over Bayesian or quasi-Bayesian estimators from a computational 
point of view. Regarding cases in which log-concavity is absent, the settings in 
which there is great practical appeal for using Bayesian and quasi-Bayesian esti- 
mates, have received little treatment in the literature. One important exception 
is the paper of Applegate and Kannan [2] , which covers nearly-log-concave but 
smooth densities using a discrete Metropolis algorithm. In contrast to Apple- 
gate and Kannan [2], our approach allows for both discontinuous and non-log- 
concave densities that are permitted to deviate from the normal density (not 
from an arbitrary log-concave density, like in Applegate and Kannan [2]) in a 
specific manner. The manner in which they deviate from the normal is moti- 
vated by the CLT and controlled by parameters t\ and e2, which are in turn 
restricted by the CLT conditions. Using the CLT restrictions also allows us to 
treat non-discrete sampling algorithms. In fact, it is known that the canonical 
Gaussian walk analyzed in Section 13.2.41 does not have good complexity prop- 
erties (rapidly mixing) for arbitrary log-concave density functions, see Lovasz 
and Vempala [ID]. Nonetheless, the CLT conditions imply enough structure so 
that even a canonical Gaussian walk becomes in fact rapidly mixing. Moreover, 
the analysis is general in that it applies to any Metropolis chain, provided that 
it satisfies a simple geometric condition. We illustrate this condition with the 
canonical algorithm. This suggests that the same approach can be used to es- 
tablish polynomial bounds for various more sophisticated schemes. Finally, as is 
standard in the literature, we assume that the starting point for the algorithm 
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occurs in the approximate support of the posterior. Indeed, the polynomial time 
bound that we derive applies only in this case because this is the domain where 
the CLT provides enough structure on the problem. Our analysis does not apply 
outside this domain. 

3. The complexity of sampling using random walks 
3.1. Set-Up and Main Result 

In this section we bound the computational complexity of obtaining a draw from 
a random variable approximately distributed according to a density function / as 
defined in (|2.5p . (Section [4] builds upon these results to study the associated in- 
tegration problem.) By invoking condition C.l, we restrict our attention entirely 
to the approximate support set K and the accuracy of sampling will be defined 
over this set. Consider a measurable space {K 1 A). Our task is to draw a random 
variable according to a density function / restricted to K. This density induces 
a probability distribution on K defined by Q{A) = J A f(x)dx/ J K f(x)dx for 
any A € A. Asymptotically, it is well-known that random walks combined with 
a Metropolis filter are capable of performing such a task. Such random walks 
are characterized by an initial point uq and a one-step probability distribution, 
which depends on the current point, to generate the next candidate point of 
the random walk. The candidate point is accepted with a probability given by 
the Metropolis filter, which depends on the likelihood function £, on the current 
and on the candidate point, and otherwise the random walk stays at the current 
point (see Casella and Robert [9] and Vempala [50 s for details; Section 13.2.41 
describes the canonical Gaussian random walk). 

In the complexity analysis of this algorithm we are interested in bounding the 
number of steps of the random walk required to draw a random variable from Q 
with a given precision. Equivalcntly, we are interested in bounding the number 
of evaluations of the local likelihood function I required for this purpose. 

Next, following Lovasz and Simonovits 37] and Vempala [50] , we review 
definitions of concepts relevant for our analysis. Let q(x\u) denote the probability 
density to generate a candidate point and 1 U (A) be the indicator function of the 
set A. For each u € K the one-step distribution P„, the probability distribution 
after one step of the random walk starting from u, is defined as 

Pu{A)= [ min ( {^\ q( , U \ X \ , l) q(x\u)dx + (1 - p u )l u (A), (3.10) 

where 

Pu= [ min( ^M^4 ,l| Q(x\u)dx (3.11) 
Jk {.f{u)q(x\u) J 

is the probability of making a proper move, namely the move to x € K, x =/= u, 
after one step of the chain from u £ K. 

The triple (K, A, {P u : u e K}), along with a starting distribution Q , defines 
a Markov chain in K. We denote by Q t the probability distribution obtained 
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after t steps of the random walk. A distribution Q is called stationary on (K, A) 
if for any A E A, 

f P u (A)dQ(u) = Q(A). (3.12) 

JK 

Given the random walk described earlier, the unique stationary probability dis- 
tribution Q is induced by the function /, Q(A) — J. f(x)dx/ J K f(x)dx for all 
A e A, see e.g. Casella and Roberts [S]. This is the main motivation for most 
of the MCMC studies found in the literature since it provides an asymptotic 
method to approximate the density of interest. As mentioned before, our goal is 
to properly quantify this convergence and for that we need to review additional 
concepts. 

The ergodic flow of a set A with respect to a distribution Q is defined as 
Q(A)= f P u (K\A)dQ(u). 

J A 

It measures the probability of the event {u € A,u' ^ A} where u is distributed 
according to Q and u' is distributed according to P u \ it captures the average 
flow of points leaving A in one step of the random walk. The measure Q is 
stationary if and only if <&{A) = §(K\A) for all A e A since 

§(A) = f P u (K\A)dQ(u)= f (l-Pu(A)) dQ{u) 

J A J A 

= Q(A) - f P u (A)dQ(u) = f P u {A)dQ{u) - f P u (A)dQ(u) 

J A JK J A 

= §{K\A). 

A Markov chain is said to be ergodic if Q(A) > for every A with < Q(A) < 1, 
which is the case for the Markov chain induced by the random walk described 
earlier due to the assumptions on /, namely conditions C.l and C.2. 

Next we recall the concept of a conductance of a Markov chain, which plays 
a key role in the convergence analysis. Intuitively, a Markov chain will converge 
slowly to the steady state if there exists a set A in which the Markov chain stays 
"too long" relative to the measure of A or its complement K\A. In order for a 
Markov chain to stay in A for a long time, the probability of stepping out of 
A with the random walk must be small, that is, the ergodic flow of A must be 
small relative to the measures of A and K\A. The concept of conductance of a 
set A quantifies this notion: 

$(A) 

mm{Q(A),Q(K\A)}' 

The global conductance of the Markov chain is the minimum conductance over 
sets with positive measure 



<j>= inf MA). 

A£A:0<Q(A)<1 



(3.13) 



Belloni and Chernozhukov/ Complexity of MCMC 



12 



Lovasz and Simonovits [37] proved the connection between conductance and 
convergence for the continuous state space, and Jerome and Sinclair [57] 
proved the connection for the discrete state space. We will extensively use 
Corolary 1.5 of Lovasz and Simonovits [37], restated here as follows: Let Qo 
be M-warm with respect to the stationary distribution Q, namely 

AeA-.Q{A)>a 

then, the total variation distance between the stationary distribution Q and the 
distribution Q t , obtained after t steps of the Markov chain starting from Qo, is 
bounded above by a function of global conductance <fi and warmness parameter 
M: 



\\Qt - Q\\tv = sup \Q t {A) - Q{A)\ < VM 1 - ^- . (3.15) 
AeA \ * J 

Therefore, the global conductance <f> determines the number of steps required 
to generate a random point whose distribution Q t is within a specified distance 
of the target distribution Q. The conductance 4> also bounds the autocovariance 
between consecutive elements of the Markov chain, which is important for ana- 
lyzing the computational complexity of integration by MCMC; see Section [4] for 
a more detailed discussion. The warmness parameter M, which measures how 
the starting distribution Qo differs from the target distribution Q, also plays an 
important role in determining the quality of convergence of Qt to Q. In what 
follows, we will calculate M explicitly for the canonical random walk. 

The main result of this paper provides a lower bound for the global conduc- 
tance of the Markov chain <f> under the CLT conditions. In particular, we show 
that l/4> is bounded by a fixed polynomial in the dimension of the parameter 
space even for a canonical random walk considered in Section 13.2.41 In order to 
show this, we require the following geometric condition on the difference between 
the one-step distributions. 

D.l There exist positive sequences h n and c„ such that for every u,v € K, 
|| w — f || < h n implies that 

\\Pu-Pv\\tv < 1-c„. 

D.2 The sequences above can be taken to satisfy the following bounds 

1 



O p {d). 



c n mm{h n V^min, 1} 

Condition D.l holds if at least a c„-fraction of the probability distribution 
associated with P u varies smoothly as the point u changes. Condition D.2 im- 
poses a particular rate for the sequences. As shown in Theorem [2] below, the 
rates in Conditions D.l and D.2 play an important role in delivering good, that 
is, polynomial time, computational complexity. We show in Section 13.2.41 that 
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Conditions D.l and D.2 hold for the canonical Gaussian walk under Conditions 
C.l, C.2, and C.3. with 

l/h n = O p (d) and l/c„ = p (l), 

and X m in bounded away from zero. Moreover, the rates in Condition D.2 appear 
to be sharp for the canonical Gaussian walk under our framework. It remains 
an important question whether different types of random walks could lead to 
better rates than those in Condition D.2 (see Vempala [50] for a relevant sur- 
vey) . Another interesting question is the establishment of lower bounds on the 
computational complexity of the type considered in Lovasz |36j . 
Next we state the main result of the section. 

Theorem 2 (Main Result on Complexity of Sampling) Under Conditions 
C.l, C.2, and D.l, the global conductance of the induced Markov chain satisfies 

( e 2(e 1+ c 2 \\K\\ 2 ,/2) 

In particular, a random walk satisfying these assumptions requires at most 

N e = O p ( e M^2\\K\\-/2) ln(M/e) 

V (c„ min{h„VA mm , l}) 2 

steps to achieve \\Qn s — Q\\tv < £ where Qq is M-warm with respect to Q. 
Finally, if Conditions C.l, C.2, C.3, D.l and D.2 hold, we have that 

1/0 = O p (d) 

and the number of steps N E is bounded by 

O p (d 2 ln(M/e)). (3.18) 

Thus, under the CLT conditions, Theorem[2]establishes the polynomial bound 
on the computing time, as stated in equation (|3. 18|) . Indeed, CLT conditions 
C.l and C.2 first lead to the bound (|3.17p and, then, condition C.3, which 
imposes e\ = o p (l) and e 2 • \\K\\ 2 j = o p (l), leads to the polynomial bound (|3.18p . 
It is also useful to note that, if the stated CLT conditions do not hold, the 
bound on the computing time needs not be polynomial: in particular, the first 
bound (|3.17p is exponential in t\ and e 2 ||^||j- It is also useful to note that 
the approximate normality of posteriors and quasi-posteriors implied by the 
CLT conditions plays an important role in the proofs of this main result and 
of auxiliary lemmas. Therefore, the CLT conditions are essential for both (a) 
good statistical properties of the posterior or quasi-posterior under increasing 
dimension, as shown in Theorem [T] and (b) for good computational properties, 
as shown in Theorem[5J Thus, results (a) and (b) establish a clear link between 
the computational properties and the statistical environment. 

The relevance of the particular random walk in bounding the conductance is 
captured through the parameters c n and h n defined in condition D.l. Theorem[2] 




(3.17) 
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shows that as long as we can take l/c n and l/h n to be bounded by a polynomial 
in the dimension of the parameter space d, we will obtain polynomial time 
guarantees for the sampling problem. In some cases, the warmness parameter 
M appearing in (|3 . 1 8[) can also be related to the particular random walk being 
used. This is the case in the canonical random walk discussed in detail in Section 
l3~2~4l 

3.2. Proof of the Main Result 

The proof of Theorem [2] relies on a new iso-perimetric inequality (Corollary 1) 
and a geometric property of the particular random walk (condition D.l). After 
the connection between the iso-perimetric inequality and the ergodic flow is 
established, the geometric property allows us to use the first result to bound 
the conductance from below. In what follows we provide an outline of the proof, 
auxiliary results, and, finally, the formal proof. 

3.2.1. Outline of the Proof 

The proof follows the arguments in Lovasz and Simonovits [37j and Lovasz 
and Vempala [38]. In order to bound the ergodic flow of A G A, consider the 
particular disjoint partition K = S\ U 52 U S3 where S\ C A, S2 C K \ A, and 
S3 consists of points in A or K \ A for which the one-step probability of going 
to the other set is at least c„/2 (to be defined later). Therefore we have 

= J A P U (K \ A)dQ(u) = \j A P U (K \ A)dQ{u) + \ J K ^ A P u (A)dQ(u) 
> \ J~ P U (K \ A)dQ{u) + i J~ P u (A)dQ(u) + f Q(S 3 ). 

where the second equality holds because &(A) = &(K \ A). 

Since the first two terms could be arbitrarily small, the result will follow 
by bounding the last term from below. This will be achieved by a new iso- 
perimetric inequality tailored to the CLT framework and derived in Section 
13.2.21 This result will provide a lower bound on Q(S I 3), which is increasing in 
the distance between Si and 5*2. 

Therefore, it remains to show that the distance between Si and S2 is suitably 
bounded below. This follows from the geometric property stated in condition 
D.l. Given two points u G Si and v £ S2, we have P U {K \ A) < c„/2 and 
Pv(A) < c„/2. Therefore, the total variation distance between their one-step 
distributions is bounded as 

\\Pu - Pv\\tv > \Pu(A) - P V (A)\ > 1 - c„. 

In such a case, condition D.l implies that the distance ||w — v\\ is bounded from 
below by h n - Since u and v are arbitrary points, the distance between sets Si 
and S2 is bounded below by h n . 

This leads to a lower bound for the global conductance. After bounding the 
global conductance from below, Theorem [5] follows by invoking the conductance 
theorem of [37] restated in equation (|3.15|) and the CLT conditions. 
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3.2.2. An Iso-perimetric Inequality 

We start by defining a notion of approximate log-concavity. A function / : TR d — ► 
IR is said to be log-/3-concave if for every a € [0, 1], x, y € TR d , we have 

f(ax+(l-a)y)>l3f(xrf(y) 1 - a 

for some /3 € (0, 1], and / is said to be log-concave if [3 can be taken to be one. 
The class of log-/3-concave functions is rather broad, including, for example, 
various non-smooth and discontinuous functions. 

This concept is relevant under our CLT conditions C.1-C.3, since the relations 
(|2.8|) and (|2.9|) imposed by these conditions imply the following: 

Lemma 1 Over the set K, the functions /(A) :— l(X)/ f A £(\)d\ and £(X) can 
be written as the product of a Gaussian function, e~i x ,JX , and a log-(3-concave 
function with parameter 

(3 = e - 2 ( £ i+ e 2ll^ll 2 // 2 ). 

The representation of Lemma [1] gives us a convenient structure to establish 
the following iso-perimetric inequality. 

Lemma 2 Consider any measurable partition of the form K = Si U 5*2 U S3 
such that the distance between Si and S2 is at least t, i.e. <i(«Si, S2) > t. Let 
Q(S) = J s fdx/ J K fdx. Then for any lower semi- continuous function f(x) = 
e -IMI m(x), where m is a log- (3 -concave function, we have 

2ie-* 2 / 4 

Q(S 3 )>(3 7 ^min{Q(Si),Q(S 2 )}. 

The iso-perimetric inequality of Lemma states that if two subsets of K are 
far apart, the measure of the remaining subset of K should be comparable to 
the measure of at least one of the original subsets. This iso-perimetric inequality 
extends the iso-perimetric inequality in Kannan and Li |28j . The proof builds 
on their proof as well as on the ideas in Applegate and Kannan [5] . Unlike the 
inequality in Kannan and Li |28j . Lemma[5]removes the smoothness assumptions 
on /, covering both non-log-concave and discontinuous cases. 

The following corollary extends Lemma to the case of an arbitrary covari- 
ance matrix J. 

Corollary 1 (Iso-perimetric Inequality) Consider any measurable partition 
of the form K — Si U S3 U S2 such that 1S2) > t, and let Q(S) — 

J s fdx/ J K fdx. Then, for any lower semi- continuous function f(x) — e~^ x Jx m(x), 
where m is a log- (3 -concave function and J is positive definite covariance matrix, 
we have 

Q(S 3 ) > (3 ^/K~te~ x -" t2/s y|min{Q(S 1 ), Q(S 2 )} , 
where X m i n denotes the minimum eigenvalue of J . 
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3.2.3. Proof of Theorem® 

Fix an arbitrary set A E A and denote by A c = K \ A the complement of A 
with respect to K. We will prove that 

HA) > ^p\[^min{^,/X~,l}mm{Q(A),Q(A c )}, (3.19) 
4 y ire [2 J 

where (3 — e^ 2 ^ tl+t2 ^ K ^-'/' 2 ^ is as defined in Lemma [TJ This result implies the 
desired bound on the global conductance </>. 
Consider the following auxiliary definitions: 

S 1 = {ueA: P U (A C ) <|},5 2 = {«6i c : P V (A) < ^} ,S 3 = K\(S 1 US 2 ). 

In this case Q{Si) < Q{A)/2 1 we have 

HA) = I P u (A c )dQ(u) > I P u {A c )dQ{u) > ! ^dQ(u) 
J A JA\S! JA\Si z 

> ^Q{A\S X )>^Q{AI 

which immediately implies the inequality (|3.19|) . In the case Q(S2) < Q(A c )/2, 
we apply a similar argument^ 

In the remaining case Q(S\) > Q(A)/2 and Q(S2) > Q(A c )/2, we proceed 
as follows. Since &(A) = $(A C ) we have that 

HA) = J P u (A c )dQ(u) = U A P u (A c )dQ(u) + i f Ac P v (A)dQ(v) 

> \S A ^P u {A-)dQ{u) + lS AC<§2 P v {A)dQ{v) 

where we used that S 3 = K\(SiUS2) = (A\S"i)U(A c \52). Given the definitions 
of the sets Si and S% , for every u € Si and v S S2 we have 

\\P U - PvWtv > Pu(A) -P v (A) = l- P U {A C ) ~ Pv(A) > 1 - c n . 

In such a case, by condition D.l, we have that ||m — v\\ > h n for every u € S\ 
and v £ S2- Thus, we can apply the iso-perimetric inequality of Corollary [1] 
with d(Si, S2) > h n , to bound Q(S 3 ). We then obtain 

[ P u {A c )dQ(u) > maxo< t < h „^ (3 VK^ te~ i x '— t2 min{Q(Si),Q(5 2 )} 

J A V 

> ^p^min{^^/X~,l}mm{Q(A),Q(A c )}. 

where we used the fact that m&xo<t<h n VKainte ~i Xmint is bounded below by 
minf/irA/A" 2}e~ 1 / 2 and that min{Q(5i), Q(S 2 )} > min{Q(A), Q(A c )}/2. 
Thus, the inequality (|3.19|) and the lower bound on conductance (|3.16p follow. 
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The bound (]3 . 1 T[) on the number of steps of the Markov Chain follows from 
the lower bound on conductance (|3.16[) and the conductance theorem of [37] re- 
stated in equation (|3.15[) . The remaining results in Thcorcm[2] follow by invoking 
the CLT conditions .■ 



3.2.4- The case of the Gaussian random walk 

In order to provide a concrete example of our complexity bounds, we consider 
the canonical random walk induced by a Gaussian distribution. Such a random 
walk is completely characterized by an initial point uq, a fixed standard devi- 
ation a > 0, and its one-step move. The latter is defined by the procedure of 
drawing a point y from a Gaussian distribution centered at the current point 
u with covariance matrix o~ 2 I, and then if y G K moving to y with probability 
mm{ f (y)/ f(u), 1} = mm{£(y) / £ (u), 1}, and otherwise staying at u. 
We start with the following auxiliary result. 

Lemma 3 Let a : H™ — > H be a function such that In a is Lipschitz with 
constant L over a compact set K . Then, for every u G K and r > 0, 

inf [a(y)/a(u)]>e- Lr . 

y£B(u,r)C]K 

Given the ball K = B(0, \\K\\), we can bound the Lipschitz constant of the 
function — X'JX/2 by 

L=sup||JA||=A inaa! pq. (3.20) 
We define the parameter a of the Gaussian random walk as 

CT = min(— (3.21) 



Using (|3T2T)1) and that \\K\\ > y/d/X* 

a> 1 —= . (3.22) 

" l2Q\ max y/d\\K\\ 

In order to apply Theorem[2]we rely on a being defined in (|3 . 2 1 1) as a function 
of the relevant theoretical quantities. More practical choices of the parameter, 
as in Robert and Rosenthal [44] and Gelman, Roberts and Gilks [17], suggest 
that we tune the parameter to ensure a particular average acceptance rate for 
the steps of the Markov Chain. These cases are exactly the cases covered by our 
(theoretical) choice of a (of course, different constant acceptance rates lead to 
different constants in the proof of the theorem). Moreover, a different choice of 
covariance matrix for the auxiliary Gaussian distribution can lead to improve- 
ments in practice but, under the assumptions on the matrix J, does not affect 
the overall dependence on the dimension d, which is our focus here. 
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Next we verify conditions D.l and D.2 for the Gaussian random walk. Al- 
though this approach follows that in Lovasz and Vempala [55J SO] , there are 
two important differences which call for a new proof. First, we no longer rely 
on the log-concavity of /. Second, we use a different random walk. 

Lemma 4 Let K := B(0, \\K\\), suppose that a < min{ 4 J^ L , |^|}, and 

\\u — v\\ < ^, where L is the Lipschitz constant specified in equation &3.20\) . 
Under conditions C.1-C.2, we have for f3 = e -2 ( Cl+e2 H K W j/ 2 ^ that 

\\Pu-Pv\\tv < 1- 

6e 

Comment 3.1 Therefore, the Gaussian random walk satisfies condition D.l 
with 

c n = 7T and h n = -. (3.23) 
3e 8 

Under the CLT framework, i.e. conditions C.l, C.2, and C.3, we have that c n 
and h n as defined in &3. 23]) satisfy condition D.2 with 

l/h n = O p (d) and l/c„ = O p (l), 

and X m in bounded away from zero. 

By applying Theorem^ to the Gaussian random walk, the conductance bound 
\3. 1 0j) becomes 

!/</, = O d e ^i+^\\K\\j/2)\ = Q p{d) 

and the bound on the number of steps N e in {3.17}) becomes 

O p (d 2 ln(M/e)). (3.24) 

Next we discuss and bound the dependence on M, the "distance" of the 
initial distribution Qo from the stationary distribution Q as defined in (|3.14|) . 
A natural candidate for a starting distribution Qo is the one-step distribution 
conditional on a proper move from an arbitrary point u G K. Thus, 

QoW^pZ 1 - f miJ { { ;\ q{ ?\ X l A q ( X \u)dx, 



KnA 



f{u)q{x\u) 



where 

f . ( f(x)q(u\x) "I 
p u = umi< ,1> q[x\u)dx 

Jk [fwq(x\u) J 

is the probability of a proper move, namely the move to x S K, x ^ u, after 
one step of the chain from u S K. We emphasize that, in general, such choice 
of Qo could lead to values of M that are arbitrary large. In fact, this could 
happen even in the case of the stationary density being a uniform distribution 
on a convex set (see Lovasz and Vempala [ID])- However, this is not the case 
under the CLT framework as shown by the following lemma. 
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Lemma 5 Suppose conditions C.1-C.2 hold, then for j3 = e~ 2 ( ei+e2 ^ K I!.// 2 ) we 
have that with a probability p u > /3/(3e) the random walk makes a proper move. 
Moreover, let u € K and Qo be the associated one-step distribution conditional 
on performing a proper move starting from u, then Qq is M-warm with respect 
to Q, where 

InM = 0(dln{\\K\\j) + ||^|| 2 7 + ei + e 2 \\K\\ 2 j). 

Under conditions c\ — o p (l), e^^K^j = o p (l), and \\K\\j = 0(Vd) we have 

InM = Op(d\nd) and p u > l/(3e) + o p (l). 

Comment 3.2 (Overall Complexity for Gaussian Walk) The combination 
of this result with relation \3.21$ , which was derived from Theorem^ yields the 
overall (burn-in plus post burn-in) running time 

O p (d 3 Ind). 

4. The complexity of Monte Carlo integration 

This section considers our second task of interest - that of computing a high 
dimensional integral of a bounded real valued function g: 

M 9 = / g(X)dQ(X). (4.25) 

Theorem [2] showed that the CLT conditions provide enough structure to bound 
the conductance of the Markov chain associated with a particular random walk. 
Below we also show how the conductance and CLT-based bounds on conduc- 
tance impact the computational complexity of calculating (|4.25|) via standard 
schemes (long run, multiple runs, and subsampling) . These new characteriza- 
tions complement the previous well-known characterizations of the error in es- 
timating (|4.25p in terms of the covariance functions of the underlying chain 
(Geyer [115], Casella and Roberts [9], and Fishman [T5]). 

In what follows, a random variable A* is distributed according to Qt, the 
probability measure obtained after iterating the chain t times, beginning from 
a starting measure Qq. The chain A',i = 0, 1, ... has the stationary distribution 
Q. Accordingly, a standard estimate of (|4.25|) . called the long-run (lr) average, 
takes the form 

-, B+N 

& = nH S(A 1 ), ( 4 - 26 ) 

i=B 

discarding the first B draws, the burn-in sample, and using subsequent N draws 
of the Markov chain. 

The dependent nature of the chain increases the number of post-burn-in 
draws N needed to achieve a desired precision compared to the infeasible case 
of independent draws from Q. It turns out that, as in the preceding analysis, 
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the conductance of the Markov chain is crucial for determining the appropriate 
N. 

The starting point of our analysis is a central limit theorem for reversible 
Markov chains due to Kipnis and Varadhan [31]: Consider a reversible Markov 
chain on K with a stationary distribution Q. The lag k autocovariance of the 
stationary time series g(X l ),i — 1,2, obtained by starting the Markov chain 
with the stationary distribution Q is defined as 

7fc -Cov Q ( 5 (A l ), 5 (A*+ fc )). 

Then, for a stationary, irreducible, reversible Markov chain, 

+ 00 

NE[(ji g - /i 9 ) 2 ] -> a 2 g = VJ 7fc , (4.27) 

k — — OQ 

almost surely. If a 2 g is finite, then 

VN& g -v g )^ d N(0,a 2 g ). (4.28) 

In our case, 70 is finite since g is bounded. Let us recall a result, which is due 
to Lovasz and Simonovits [37], and which states that a 2 can be bounded using 
the global conductance <^> of a stationary, irreducible, reversible Markov chain: 
Let g be a square integrable function with respect to the stationary measure Q, 
then k 

|7fc| < (l - y) 70 and °l ~ 70 ■ (429) 

We will use these conductance-based bounds to obtain bounds on the complexity 
of integration under the CLT conditions. 

There exist other methods for constructing the sequence of draws in con- 
structing estimators of the type (|4.26p ; we refer to Geyer [19] for a detailed dis- 
cussion. In addition to the long run (lr) method, we also consider the subsamplc 
(ss) and multi-start (ms) methods. Denote the number of post burn-in draws 
corresponding to each method as Ni r , N ss , and N ms . As mentioned above, the 
long run method consists of generating the first point using the starting distri- 
bution Qq and, after the burn-in period, selecting the Ni r subsequent points to 
compute the sample average. The subsample method also uses only one sample 
path, but the N ss draws used in the sample average are spaced out by S steps 
of the chain. Finally, the multi-start method uses N ms different sample paths, 
initializing each one independently from the starting probability distribution 
Qo and picking the last draw in each sample path after the burn-in period to 
compute the average. Thus, all estimators discussed above take the form 

with the underlying sequence A 1,B , \ 2 ' B , X N ' B produced as follows: 
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• for lr, X l,B — X l+B , where B is the burn-in period, 

• for ss, \ l ' B = X lS+B , where S is the number of draws being skipped, 

• for ms, X l ' B are i.i.d. draws from Qb, that is, X hB ~ X B for every i. 

There is a final issue that must be addressed. Both the central limit theorem 
of [31] , restated in equations (|4.27| and (|4.28|) and the conductance-based bound 
of [37] on covariances restated in equation ()4.29[) require that the initial point 
be drawn from the stationary distribution Q. However, we are starting the chain 
from some other distribution Qq, and in order to apply these results we need to 
first run the chain for sufficiently many steps B, to bring the distribution of the 
draws Qb close to Q in total variation metric. This is what we call the burn- 
in period. However, even after the burn-in period there is still a discrepancy 
between Q and Qb, which should be taken into account. But once Qb is close 
to Q, we can use the results on complexity of integration where sampling starts 
with Q to bound the complexity of integration where sampling starts with Qb, 
where the bound depends on the discrepancy between Qb and Q. Thus, our 
computational complexity calculations take into account all of the following 
three facts: (i) we are starting with a distribution Qo that is M-warm with 
respect to Q, (ii) from Qq we are making B steps with the chain in the burn-in 
period to obtain Qb such that \\Qb — Q\\tv is sufficiently small, and (iii) we 
are only using draws after the burn-in period to approximate the integral. 

We use the mean square error as the measure of closeness for a consistent 
estimator: 

MSE{%)=E\pi g - N f . 

Theorem 3 (Complexity of Integration) Let Qo be M-warm with respect 
to Q, and let g := sup Aeif In order to obtain 

MSEfjlg) < e 

it is sufficient to use the following lengths of the burn-in sample, B, and post- 
burn-in samples, Ni r , N ss , N ms : 



B=l ^)J^Mt 



:2 



and 



at "?o 6 370 2 / 670 \ 2 7o 
N ir = — -jo i N ss = with S = -J In ( ) , N ms - 



e ) 3e 

The overall complexities of the lr, ss, and ms methods are thus B+Ni r , B+SN SS , 
and B x N ms . 

For convenience, Table[T]tabulates the bounds for the three different schemes. 
Note that the dependence on M and g is only via log terms. Although the 
optimal choice of the method depends on the particular values of the constants, 
when £ \ 0, the long-run algorithm has the smallest (best) bound, while the the 
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Table 1 

Burn-in and Post Burn-in Bounds on the Complexity of Integration of a Bounded Function 

via Conductance 

Method Quantities Complexity 



Long Run B + N lr £ (in (^MllX) + £ (2m) 

Subsample B + N ss ■ S £ fin f ^Mt. \\ + £ i n (2±&)) 
Multi-start B x N ms 4- ( In f 24Vgg3 ) ) x 222. 



Table 2 

Burn-in and Post Burn-in Bounds on the Complexity of Integration of a Bounded Function 
using the Gaussian random walk under the CLT framework with 
\\K\\ J = 0(Vd),e 1 =o p (l),£ 2 ||i<'|| 2 J = o p (l), andg = Q(l). 



Method 


Burn-in Complexity 


Post-burn-in Complexity 


Long Run 
Subsample 
Multi-start 


O p (d 3 lnd- lne" 1 ) 
O p (d 3 lnd • lne" 1 ) 
O p (d 3 lnd - lne- 1 ) 


+ O p {d 2 -e- 1 ) 

+ O p (d 2 ■ e- 1 ■ lne- 1 ) 

x OAe- 1 ) 



multi-start algorithm has the largest (worst) bound on the number of iterations. 
Table[2]prescnts the computational complexities implied by the CLT conditions, 
namely 

\\K\\j = 0{Vd),e 1 = op(l), and e 2 \\K\\ 2 j = o p (l), 

and the Gaussian random walk studied in Section 13.2.41 The table assumes 70 
and g are constant, though it is straightforward to tabulate the results for the 
case where 70 and g grow at polynomial speed with d. Finally, note that the 
bounds apply under a slightly weaker condition than the CLT requires, namely 
that ei = O p (l) and e 2 \\K\\ 2 j = O p (l). 

5. Applications 

In this section we verify that the CLT conditions and the analysis apply to a 
variety of statistical problems. In particular, we focus on the MCMC estimator 
(jl.3p as an alternative to M- and ^-estimators. Here our goal is to derive the 
high-level conditions C1-C3 from appropriate primitive conditions, and thus 
show the efficient computational complexity of the MCMC estimator. 
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5. 1 . M-Estimation 

We present two examples in M-estimation. We begin with the canonical log- 
concave cases within the exponential family. Then we drop the concavity and 
smoothness assumptions to illustrate the full applicability of the approach de- 
veloped in this paper. 

5.1.1. Exponential Family 

Exponential families play a very important role in statistical estimation, cf. 
Lchmann and Casella [33], especially in high-dimensional contexts; see Portnoy 
[H] , Ghosal [5D], and Stone et al. @7|- For example, the high-dimensional situ- 
ations arise in modern data sets in technometric and econometric applications. 
Moreover, exponential familes have excellent approximation properties and are 
useful for approximation of densities that are not necessarily of the exponential 
form; see Stone et al. [47] . 

We base our discussion on the asymptotic analysis of Ghosal [20] . In order to 
simplify the exposition, we invoke the more canonical conditions similar to those 
given in Portnoy [42]- Moreover, we assume that these conditions, numbered as 
E.l to E.4, hold uniformly in the sample size n. 

E.l Let Xi, . . . , X n be iid observations from a d-dimensional canonical expo- 
nential family with density 

h(x;9) =&q>(x'0-ip(0)), 

where 9 £ is an open subset of H d , and d — > oo as n — > oo. Fix a sequence 
of parameter points 9 e 6. Set // = ip'(Oo) and J = ip"(9o), the mean 
and covariance of the observations, respectively. Following Portnoy [4"2"] . 
we implicitly re-parameterize the problem, so that the Fisher information 
matrix J = I. 

For a given prior 7r on 0, the posterior density of 6 over O conditioned on 
the data takes the form 

n 

(9) ■ Y[ h{Xi,9) = iv (9) ■ exp (nX'9 - m/)(6)) , 

i=l 

where X — Y17=i Xi/n is the empirical mean of the data. 

We associate every point 9 in the parameter space with a local parameter 
A e A = y / n(6 — 9) — s, where 

A = - 9 ) - s, 

and s = i/n(x — fj,) is a first order approximation to the normalized maximum 
likelihood/extremum estimate. By design, we have that E[s) = and E [ss 1 ] = 
Id- Moreover, by Chebyshev's inequality, the norm of s can be bounded in 
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probability, ||s| = O p (Vd). Finally, the posterior density of A over A = \fn{Q — 
O ) - s is given by /(A) = j ^ )dA , where 



= exp ( X'^/nX - nil) ( do + ^t" ) + nip ( 9 + 



in 



X + s 



X 71" 6» + — =- WtT [00 + — = 



s 



(5.30) 



We impose the following regularity conditions, following Ghosal [20 and Port- 
noy [42] : 

E.2 Consider the following quantities associated with higher moments in a 
neighborhood of the true parameter 9$, uniformly in n : 

B ln (c) := sup{Eo\rf(xi - fi)\ 3 : V eS d ,\\6- 8 \\ 2 < cd/n}, 

8,r) 

B 2n (c) := sup{E e \r)'(xi - M )| 4 : V € S d , \\9 - 9 Q \\ 2 < cd/n}. 

6,7, 

where S d = {r/ S IR d : ||r?|| = 1}. There are p > and Co > such that 
B\ n {c) < c a + c p and B 2n (c) < c a + c p for all c > and all n. 
E.3 The prior density tt is proper and satisfies a positivity requirement at the 
true parameter 

sup In [7r(0)/7r(0 o )] = 0(d) 
see 

where 9q is the true parameter. Moreover, the prior tt also satisfies the 
following local Lipschitz condition 

| hi7r(0) - ln7r(0o)| < V(c)Vd\\6 - O || 

for all 9 such that \\9— 9o\\ 2 < cd/n, and some V(c) such that V(c) < cq+c p , 
with the latter holding for all c > 0. 

E.4 The parameter dimension d grows at the rate such that d 3 /n — > 0. 

Condition E.2 strengthens an analogous condition of Ghosal [20], and implies 
an analogous assumption by Portnoy [42] , Condition E.3 is similar to the condi- 
tion on the prior in Ghosal [20]. For further discussion of this condition, see [4]. 
Condition E.4 states that the parameter dimension should not grow too quickly 
relative to the sample size. 

Theorem 4 Conditions E.1-E.4 imply conditions C.1-C.3 with \\K\\ — C\fd 
for some C > 1. 

Comment 5.1 Combining Theorems^ and^ we have the asymptotic normal- 
ity of the posterior, 

r |/(A)-0(A)|dA = o p (l). 
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Furthermore, we can apply Theorem to the posterior density f to bound the 
convergence time (number of steps) of the Metropolis walk needed to obtain a 
draw from f (with a fixed level of accuracy): The convergence time is at most 

O p (d 2 ) 

after the burn-in period; together with the burn-in, the convergence time is 

O p (d 3 hid). 

Finally, the integration bounds stated in the previous section also apply to the 
posterior f. 

5.1.2. Curved Exponential Family 

Next we consider the case of a d-dimcnsional curved exponential family. The 
curved family is general enough to allow for non-concavities and even non- 
smoothness in the log-likelihood function, which the canonical exponential fam- 
ily did not allow for. We assume that the following conditions, numbered as 
NE.l to NE.4, hold uniformly in the sample size n, in addition to the previous 
conditions E.l to E.4. 

NE.l Let X\, . . . , X n be iid observations from a d-dimensional curved exponen- 
tial family with density 



The parameter of interest is 77, whose true value 770 lies in the interior of a 
convex compact set ^ C H dl . The true value of 9, induced by 770 is given 
by #0 = @(Vo)- The mapping 77 1— > 9{rf) takes values from TR dl to JR d where 
c • d < d\ < d, for some c > 0. Finally, d — > 00 as n — > 00. 
NE.2 True value 770 is the unique solution to the system 9{rj) = 9q, and we have 
that ||0(77) — 0(77o)|| > eo 1 1 *7 — 770 1 1 for some eo > and all 77 e ^. 

Thus, the parameter 9 corresponds to a high-dimensional linear paramctriza- 
tion of the log-density, and 77 describes the lower-dimensional parametrization 
of the log-density. There are many classical examples of curved exponential 
families; see for example Efron [14], Lehmann and Casella [33], and Bandorff- 
Nielsen [3]. An example of the condition that puts a curved structure onto an 
exponential family is a moment restriction of the type: 



This condition restricts 9 to lie on a curve that can be parameterized as {9(rj), 77 € 
V?}, where the parameter 77 = (a,/3) contains the component a as well as other 
components (3. In econometric applications, moment restrictions often represent 
Euler equations that result from the data x being an outcome of an optimization 



h{x; 9) = exp {x'9{rf} - ip(9(r)))) . 
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Fig 2. This figure illustrates the mapping O(-). The (discontinuous) solid line is the mapping 
while the dash line represents the linear map induced by G. The dash-dot line represents the 
deviation band controlled by r\ n and i?2n- 



by rational decision-makers; see e.g. Hansen and Singleton [21] , Chamberlain 
[TO] . Imbens [25], and Donald, Imbens and Newey [13]. Thus, the curved expo- 
nential framework is a fundamental complement of the exponential framework, 
at least in certain fields of data analysis. 

We require the following additional regularity conditions on the mapping #(•): 

NE.3 For every k, and uniformly in 7 G B(0. n\/d), there exists a linear operator 
G : K dl — > TR d such that G'G has eigenvalues bounded from above and 
away from zero, uniformly in n, and for every n 

Vn (0(r? o + j/Vn) - % )) = r ln + (I d + R 2n )Gj, 

where ||ri„|| < 5\ n and ||i?2n|| < $2n and 5i„y/d — > and 62nd — * 0. 

Thus the mapping rj i— > #(77) is allowed to be nonlinear and discontinuous. 
For example, the additional condition of 8\ n = implies the continuity of the 
mapping in a neighborhood of 770. More generally, condition NE.3 does impose 
that the map admits an approximate linearization in the neighborhood of 770, 
whose quality is controlled by the errors 5\ n and $2„. An example of a kind of 
map allowed in this framework is given in Figure [2] 

Given a prior ?r on 9, the posterior of rj given the data is denoted by 

n 

7r„fa) cx tt(%)) • 11 h{X i; rj) = n(6( v )) ■ exp (nX'%) - f#(0fa))) . 

i=l 

In this framework, we also define the local parameters to describe contiguous 
deviations from the true parameter as 

l = Mv-m)-s, s = (G'G)- l G'V^(x-ji), 

where s is a first order approximation to the normalized maximum likclihood/extremum 
estimate. Further, we have that E[s] = 0, E[ss'} = (G'G)" 1 , and ||s|| = O p (Vd). 
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1(7) 



The posterior density of 7 over T, where T = y/n^ — r/o) — s, is fOy) — r , 

J r A7)d7 

where 

£( 7 ) = exp fnX' (e La + - 9 + 

x exp f -r^ f Uo + ^) ) + ^ + -j= ) ) ) ( - r ..31 1 

x 'K* + ^)) /,r K* + ^ 

The condition on the prior is the following: 

NE.4 The prior Tr(rj) cx ir(8(rj)), where 7r(0) satisfies condition E.3. 



Theorem 5 Conditions E.l-E.4 and NE.l-NE.4 imply conditions C.1-C.3 with 
\\K\\ = C 'a/ aj ' X m in for some C > 1, where A m ; n is f/ie minimal eigenvalue of 
J = G'G. 

Comment 5.2 Theorems^ and\^ imply the asymptotic normality of the pos- 
terior, 



L 



1/(7) - #7)1*7 = 

w/iere 

0(7) = T-pr exp ( -^7'(G'G)7 

V ' (2^/ 2 det((G'G)- 1 ) 1/2 \ 2 ri " 

Theorem implies further that the main results of the paper on the polynomial 
time sampling and integration apply to this curved exponential family. 



5.2. Z- estimation 



Next we turn to the .^-estimation problem, where our basic setup closely follows 
the setup in e.g. He and Shao [22]. We make the following assumption that 
characterizes the setting. As in the rest of the paper, the dimension of the 
parameter space d and other quantities will depend on the sample size n. 

ZE.O The data Xi,...,X n are i.i.d, and there exists a vector-valued moment 
function m : X x IR d — > R, dl such that 

E[m{X,6)\ = at the true parameter 6 = 9 Q e 0„ C B(9 ,T„) C TR d . 

Both the dimension of the moment function d\ and the dimension of the 
parameter d grow with the sample size n, and we restrict that cd\ < d < 
di for some constant c. The parameter space O n is an open convex set 
contained in the ball B(9 , T n ) of radius T n , where the radius T n can grow 
with the sample size n. 
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The normalized empirical moment function takes the form 

1 ™ 

s n (d) = - 7 =Y)m(x i> 0). 

The Z-estimator for 9o is defined as the minimizcr of the norm ||5 Tl (0)||. How- 
ever, in many applications of interests, the lack of continuity or smoothness of 
the empirical moments S n (9) can pose serious computational challenges to ob- 
taining the minimizer. As argued in the introduction, in such cases the MCMC 
methodology could be particularly appealing for obtaining the quasi-posterior 
means and medians as computationally tractable alternatives to the Z-estimator 
based on minimization. 

We then make the following variance and smoothness assumptions on the 
moment functions in addition to the basic condition ZE.O: 

ZE.l Let S"* 1 = {rj G IR dl : \\r)\\ = 1} denote the unit sphere. The variance of the 
moment function is bounded, namely sup Tf€S d 1 E[(j]'m(X, 9q)) 2 ] = 0(1). 
The moment functions have the following continuity property: sup^s^ (E[(rf (m(X, 9)— 
m(X, 6> ))) 2 ]) 1/2 < 0(1) ■ \\6- 6 \\ a , uniformly in 9 G Q n , where a G (0, 1] 
and is bounded away from zero, uniformly in n. Moreover, the family of 
functions T = {rj'(m(X,9) - m(X,9 )) : 9 G 0„ C H d , f] G S dl } is not 
very complex, namely the uniform covering entropy of T is of the same or- 
der as the uniform covering entropy of a Vapnik-Chervonenkis (VC) class 
of functions with VC dimension of order O(d), and T has an envelope F 
a.s. bounded by M = 0(y/d). 

The smoothness assumption covers moment function both in the smooth 
case, where a = 1, and the non-smooth case, where a < 1. For example, in 
the classical mean regression problem, we have the smooth case a = 1 and in 
the quantile regression problems mentioned in the introduction, we have a non- 
smooth case, with a = 1/2. The condition on the function class T is standard 
in statistical estimation and, in particular, holds for T formed as VC classes 
or certain stable transformations of VC classes (see van der Vaart and Wellner 
[49]). We use the entropy in conjunction with the maximal inequalities similar to 
those developed in He and Shao [22] ■ The condition on the envelope is standard, 
but it can be replaced by an alternative condition on supy e jp-n _1 Y^i=i / 4 > see 
e.g. He and Shao [22j . which can weaken the assumptions on the envelope. 

Next we make the following additional smoothness and identification assump- 
tions uniformly in the sample size n. 

ZE.2 The mapping 9 i— > E[m(X,8)] is continuously twice differentiable with 
|| sup,^^ ^gE[m(X, 9)][r), rj\\\ bounded by 0(y/d) uniformly in 9, uni- 
formly in n. The eigenvalues of A' A, where A = VE[m(X, 9o)] is the 
Jacobian matrix, are bounded above and away from zero uniformly in n. 
Finally, there exist positive numbers fi and 5 such that uniformly in n, the 
following identification condition holds 

\\E[m(X,6)]\\ > (^JI\\9-9o\\ AS). (5.32) 
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This condition requires the population moments E[m(X, 9)] to be approximately 
linear in the parameter 9 near the true parameter value 9q, and also insures 
identifiability of the true parameter value 9q. 

Finally, we impose the following restrictions on the parameter dimension d 
and the radius of the parameter space T n . 

ZE.3 The following condition holds: (a) d 4 log 2 n/n -> 0, (b) d 2+a logn/n" -> 0, 
and (c) dT% a \ogn/n^> 0. 

These conditions are reasonable. Indeed, if we set a — 1 and use radius T n = 
O(dlogn) for parameter space, then we require only that d 4 /n — > 0, ignoring 
logs, which is only slightly stronger than the condition d 3 /n — > needed in 
the exponential family case. In the latter case, the information on higher order 
moments lead to the weaker requirement. Also, an important difference here 
is that we are using the flat prior in the Z-estimation framework, and this 
necessitates us to restrict the radius of parameter space by T n . Note that even 
though the bounded radius T n = 0(1) is already plausible for many applications, 
we can allow for the radius to grow, for example, T n = O(dlogn) when a = 1. 

In order to state the formal results concerning the quasi-posterior, let us 
define the quasi-posterior and related quantities. First, we define the criterion 
function as Q n (9) — ~\\S n {9)\\ 2 , and treat it as a replacement for the log- 
likelihood. We will use a flat prior over the parameter space 8, so that the 
quasi-posterior density of 9 over 8 takes the form 



We associate every point 9 in the parameter space 8 with a local parameter A S 
A = v/n(6 - d ) - s, where A = y/n(9-9 ) - s, and s = -(A'A)~ 1 A'S n (9 ) is a 
first order approximation to extremum estimate. We have that E[m(X, 9o)m(X, 9o)'] 
is bounded in the spectral norm, and (A'A)^ 1 ^ has a bounded norm, so that 
the norm of s can be bounded in probability, ||s|| = O p (^/d), by the Chebyshev 
inequality. Finally, the quasi-posterior density of A over A = y / n(8 — 9q) — s is 
given by 



Theorem 6 Conditions ZE.0-ZE.3 imply conditions C.1-C.3 with \\K\\ = C\Jdj\ 
for C > 1, where A m i n is the minimal eigenvalue of J — 2A'A. 

Comment 5.3 Theorems^ and\^ imply the asymptotic normality of the quasi- 
posterior, 



7T»(0) 



cxp{Q n (9)} 



J e eMQn(0')}d9 r 




where 



£(A) = exp(Q„(fl + (A + 8 )/y/n) - Q n (9o + s/y/n)). 




where 
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Theorem implies further that the main results of the paper on the polynomial 
time sampling and integration apply to the quasi-posterior density formulated 
for the Z- estimation framework. 

6. Conclusion 

In this paper we study the implications of the statistical large sample theory 
for computational complexity of Bayesian and quasi-Bayesian estimation car- 
ried out using a canonical Metropolis random walk. Our analysis permits the 
parameter dimension of the problem to grow to infinity and allows the under- 
lying log- likelihood or extremum criterion function to be discontinuous and/or 
non-concave. We establish polynomial complexity by exploiting a central limit 
theorem framework which provides the structural restriction on the problem, 
namely, that the posterior or quasi-posterior density approaches a normal den- 
sity in large samples. 

We focused the analysis on (general) Metropolis random walks and provided 
specific bounds for a canonical Gaussian random walk. Although it is widely 
used for its simplicity, this canonical random walk is not the most sophisticated 
algorithm available. Thus, in principle further improvements could be obtained 
by considering different kinds of algorithms, for example, the Langevin diffu- 
sion [43l HH Hll Q]. (Of course, the algorithm requires a smooth gradient of 
the log-likelihood function, which rules out the nonsmooth and discontinuous 
cases emphasized here.) Another important research direction, as suggested by 
a referee, could be to develop sampling and integration algorithms that most 
effectively exploit the proximity of the posterior to the normal distribution. 
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Appendix A: Proofs of Other Results 
Proof of Theorem [TJ From C.l it follows that 




/ |/(A)-0(A)|dA + o p (l) 



JK 
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(27r) d / 2 det(J- 1 ) 1/2 
Now, denote C n — ? — — — H and write 



J K £(u)du! 



K 



M_i 

0(A) 



0(A)dA = 



K 



C„ • exp ( ln^(A) - ( --A'JA ) ) -1 



1 



(f>(X)d\ 



Combining the expansion in C.2 with conditions imposed in C.3 
/(A) 



0(A) 



- 1 



0(A)dA < J K \C n • exp (ei + e 2 A' JA) - 1| 0(A)dA 

+ /„ |C„ • exp (-ei - e 2 A' JA) - 1| 0(A)dA 



< 2 



C*„ • e°" (1) - 1 



0(A)dA 



< 2|C*„e°pW - 1| 



The proof then follows by showing that C n — » p 1. Using condition C.l on the 
set Jf = 5(0, II^H) and C.2, 



Cn 



> 



J 


f £(A)dA 
if 


(l + o(l)) J 


f e-i x ' Jx d\ 

K 


1 det(J) 



> 



3 ~±A'JA 
if 

iA'(J+ £2 J)A 



r/A 



det(J) J K (2v) d / 2 det((J + e 2 J) -1 ) 1 /' 



r dA 



(l+o(l)) V det(J + e 2 J) 



e — 5 A', 7 A 



A' 



(27r) d / 2 det(J" 1 ) 1 /2 



rfA 



Since e 2 < 1/2, we can define W ~ N(0, (1 + e 2 )- 1 J" 1 ) and V - N(0, J~ r ) and 
rewrite our bound as 



1 

On 



> 



> 



1 \ d/2 P(\\ W \\ < \\ K \\) 



(l + o(l)) \l + e 2 J P(\\V\\<\\K\\) 

/ l \ d / 2 



(l + o(l)) \l + e 2 



where the last inequality follows from P(\\W\\ < \\K\\) > P{\\y/l + e 2 W\\ < 
l|A-|n=P(||V|| < 11X11). Likewise, 



< 



l(\)d\ 



K 



e-i yjx d\ 



1-E2 



d/2 



K 



Therefore C„ —* p 1 since t\ — > p and e 2 ■ d ^ p (cf . Comment 12. lj) 
Proof of Lemma QJ The result follows immediately from equations 
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Proof of Lemma [2} Let M := (3 2te ^' . Take any measurable partition of 
K = Si U S 2 U S 3 , with d(Si,S 2 ) > t. It suffices to prove that 



(Ml Si (i) - ls a {x)) f(x)dx < 0, for i = 1 or i = 2. 
We will prove this by contradiction. Suppose that 

(Ml Si (x) - l<? 3 (x)) f(x)dx > 0, for i = 1 and i = 2. 



We will use the Localization Lemma of Kannan, Lovasz, and Simonovits [29J in 
order to reduce a high-dimensional integral to a low-dimensional integral. 

Lemma 6 (Localization Lemma) Let g and h be two lower semi-continuous 
Lebesgue integrable functions on TR d such that 

g(x)dx > and / h(x)dx > 0. 
TR d JR d 

Then there exist two points a,b £ TR d , and a linear function 7 : [0, 1] — > IR+ 
such that 

l d ~ 1 {t)g{{l-t)a + tb)dt>Q and [ 7 d_1 (t)/i((l - t)a + tb)dt > 0, 

where ([a, b], 7) zs sazd to form a needle. 

Proof. Sec Kannan, Lovasz, and Simonovits 29J . ■ 

By the Localization Lemma, there exists a needle (a, 6, 7) such that 

1 

7 d - 1 (0/((l - Oo + lb) (Ml Si ((l - l)a + lb) - 1 S3 ((1 - l)a + lb)) du > 0, 



for i = 1,2. Equivalently, using 7(w) = 7(w/||& — a||) and w := (b — a)/\\b — a\\ 
where \\b — a\\ > t, and rearranging we have for i = 1,2, 

/> 1 1 6— a I 

M / 7 d_1 (w) 

/(a + uv)lsi {a + uv)du 

nu (A.33) 

1 — a. II v ' 



> / 7 (u)ls 3 (a + uv)f{a + uv)du. 
Jo 

In order for the left hand side of (|A.33p be positive for i — 1 and i — 2, the 
line segment [a, 6] must contain points in Si and S2. Since d(S\, S2) > t, we have 
that S3 n [a, 6] contains an interval [w, W + t] whose length is at least t. Thus, 
we can partition the line segment [a, b] into [0, w) U [w, w + t] U (w + t, ||6 — a||]. 
We will prove that for every w £ IR such that < w < w + t < \\b — a\\ 



-y d ~ 1 (u)f(a + uv)du >Mmin< / 7 d_1 {u)f{a + uv)du, 

\b-<4 

7 1 (u)f(a + uv)du 

w + t 



(A.34) 
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which contradicts the relation (|A.33P and proves the lemma. 

First, note that /(a + uv) = e^ a+uv ^' m{a + uv) = e~" u2+TlU+r °m(a + uv) 
where r\ := 2a' 'v and r := — ||a|| 2 . Next, recall that m(a + uv)^^ 1 ^) is 
still a unidimensional log-/3-concave function on u. By Lemma [5] presented in 
Appendix B, there exists a unidimensional logconcave function to such that 
(3rh(u) < m(a + uv)^ d ~ 1 {u) < fh{u) for every u. Moreover, there exists numbers 
so and s\ such that rh(w) — soe SlW and m(w + t) = soe Sl ^ + *' > - Due to the 
log-concavity of to, this implies that 

m(w) > soe SlU for u 6 (w, w + t) and rh(u) < Soe SlU otherwise. 

Thus, if we replace m(a + uv)^f d ~ 1 (u) by soe SlU on the right hand side of 
(|A.34|) and replace m(a + uv)^j d ^ 1 (u) by /3s e Sl " on the left hand side of (|A.34|) . 
and define r\ = r\ + si and r := r + lnsp, we obtain the relation 

rw+t ^ ^ f r w ^ ^ Hl^ - a ll ^ ^ I 

(3 e~ u2+riU+ro du> Mmin / e - u2+riU+r °du, / e -"" 2 + ri " +r °d<ii \. 

Jw {JO Jw+t J 

This relation is stronger than (|A.34j) and thus implies (|A.34j) . This relation is 
equivalent to 

/>W + t ^ ^ ^2 ( />w ^ ^ ^2 

/? / e- {u - T -^ )2+ro+ ^du >Mmml e -^-^ 2+ro+ ^ du, 

,116- ~ ~ * 1 (A ' 35) 

— ^2 

Now, cancel the term e ro+ri ' i on both sides and, since we want the inequality 
(|A.35|) holding for any w, (IA.35[) is implied by 

w+t 2te~ t2 / A ( r w i r°° 2 1 

e~ u 'du> m in\ e - ,l ~ du, / e' u du \ (A.36) 

; V n KJ-oo Jw+t J 

holding for any w. This inequality is Lemma 2.2 in Kannan and Li [28] .■ 
Proof of Corollary [l] Consider the change of variables x = J ^ and 5* = 



J 1 . Then, in x coordinates, f(x) — e x X m(\/2J 1 / 2 x) satisfies the assumption 



V2 



of Lemma [2] and d(S '1,62) > J% in ■ ^ ne resur t follows by applying Lemma 
with x coordinates. ■ 

Proof of Lemma [3j The result is immediate from the stated assumptions. ■ 
Proof of Theorem [2l See section I3~2l ■ 

Proof of Lemma [4j Define K := B(Q,R), so that R is the radius of K; 
also let r := A^fda (where a 2 < 16 ^ L2 ), and let q(x\u) denote the normal 
density function centered at u with covariance matrix a 2 I. We use the following 
notation: B u = B(u, r), B v — B(v, r), and A UjV = B u (~l B v (~l K. By definition 
of r, we have that / q(x\u)dx = J B ^ q{x\v)dx > 1 - P{\U\ > 4} > 1 - 1/10 4 , 
where U ~ N(0,1). 
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Define the direction w — (v — u)/\\v — u\\ . Let Hi = {x e B U <~)B V : w'(x~u) > 
\\v - u\\/2}, H 2 = {x e B u n B v : w'(x - u) < \\v - u\\/2}. Consider the one- 
step distributions from u and v. We first observe that in view of Lemma [1] and 
Lemma [3] that ini X £B(y,r) f( x )/f(y) > (3e~ Lr . Then we have that 

\\P*-Pv\\tv<1- [ min{ dP„, dP v } < 1 - / min{ dP u ,dP v } 
= 1- J min |g(a;|u) min | l| , q(x\v) min |y|^y> ijj dx 

< 1 — (3e~ Lr / min {q(x\u), q(x\v)} dx 



<l-f3e~ Lr ( / q(x\u)dx+ \ q{x\v)dx 
\JHinK JH 2 r\K 

where ||u — u|| < a/8. Next we will bound from below the last sum of integrals 
for an arbitrary u G K. 

We first bound the integrals over the possibly larger sets, respectively H\ 
and i?2- Let h denote the density function of a univariate random variable 
distributed as N(0, cr 2 ). It is easy to see that h{t) = J w ,^ x _ u ^ =t q(x\u)dx, i.e. 

h is the marginal density of q(-\u) along the direction w up to a transla- 
tion. Let H 3 = {x : - v\\/2 < w'(x - u) < \\v - u\\/2}. Note that 
B u C Hi U (Hi — 1 1 it — v\\w) U i?3 where the union is disjoint. Armed with 
these observations, we have 

q(x\u)dx + / q(x\v)dx = / q(x\u)dx + / q(x\u)dx 

J H2 J Hi J H2 — || u — v || w 

q(x\u)dx — I q(x\u)dx 
Jh 3 

/\\u—v\\/2 
h(t)dt 
-Uu-ull/2 

1 r\\ u - v \\/ 2 -t 2 /2a- 2 

1 7 - / — t= — dt 

10 4 7_ii„_„ii /2 

1. 1 1 1 



10 4 11 11 

where we used that \\u — v\\ < a/8 by the hypothesis of the lemma. 

In order to take the support K into account, we can assume that u, v S dK, 
i.e. ||u|| = ||«|| = R (otherwise the integral will be larger). Let z = (v + u)/2 
and define the half space H z = {x : z'x < z'z} whose boundary passes through 
u and v (Using ||m|| = \\v\\ = R it follows that z'v = z'u — z'z/2). 

By the symmetry of the normal density, we have 

q(x\u)dx = — f q(x\u)dx. 
H 1 nH z 2 J Hl 
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Although Hi n H z does not lie in K in general, simple arithmetic shows that 

h^(h z -^)qk^ 

2 I rp 

Using that J H ^ H r 2 z ^ q(x\u) = f Q h(t)dt, we have 

r r r r2 /R 

q{x\u)dx > / q(x\u)dx > / q(x\u)dx — / h(t)dt 

UinK JB^^H.—gfa) JH 1 nH z Jo 

> - q(x\u)dx- -j=-dt 



1 11 
> — / q(x\u)dx — 4%/dcr = — , , 

where we used that < -^-q since r = A^/da and ^ < . 

By symmetry, the same inequality holds when u and Hi are replaced by v 
and H 2 respectively. Adding these inequalities and using (|A.37[) , we have 

q(x\u)dx + I q(x\v)dx) > |- - — i= > 1/3. (A.38) 
HinK Jh 2 dk J M 15V27T 



Thus, we have 



\P U -P V \\ < l-fe" ir 



and the result follows since Lr < 1. ■ 

Proof of Lemma [5j We calculate the probability p of making a proper move. 
We will use the notation defined in the proof of Lemma [U Let u be an arbitrary 
point in K. We have that 

Pu = / K mm|^, l\q{x\u)dx > /3e~ Lr $ B ^ nK q{x\u)dx > j3e~ Lr \, 

where we used that mi x eB(y.r) f( x )/f(y) > (3e~ Lr by Lemma Q] and Lemma [3] 
and the bound (|A.38[) for the case that u = v so that B u = Hi U Hi. Since 
Lr < 1 we conclude that p u > /3/3e. 

We then note that for Q(A) > the ratio Qq(A)/Q(A) is bounded above by 
sup xeK dQ Q (x)/dQ(x); dQ (x)/dx is bounded above by p-^-IHI 2 /^ 2 ■{2-Ko 2 y d l 2 < 
p^ 1 ■ {2ixo 2 )~ d l 2 \ and dQ{x)/dx is bounded over x € K below by {2ir)~ d / 2 
det(J 1 /2) e -i x ' Jx (3 1 ' 2 > (27r)- d / 2 A^„e-^l K ll 2 //3 1 / 2 , where /3 = e ~ 2 ^+^ K ^/ 2 \ 



1 Indced, take y £ Hi n ^H z — Ji^j^j ■ We can write y = ^jfi]M + s > wncrc IN < r 



(since 



< || y — z\\ = \\y — "I" 1 ' | < i||y — u\\ + — v\\ < r) and s is also 



orthogonal to z. Since y e I H z - ^ t^tt j , we have fif < fif - % = INI - ^ < R - ^ 



Therefore, || V || = W (fjf ) + ||s|| 2 ^) 2 + r 2 = ,/i? 2 - r 2 (l - < A 
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Thus, we can bound 

max A( z A . Q{A)>0 -Q^jy <p u a X min ' 

<3e[120VdA mM ||X||/ % /^~] d e ^l^ll 2 //5- 3 / 2 

< 3[120||X||2]rf e 3e I +2e 2 ||K||=+l ) 



where we used the bound on a given in (13.22[) . and the fact that ||-K"||,/ > 
y/KninWKW and \\K\\j > \fd ^J\ ma xl\nin (cf. Comment 

The remaining results in the Lemma follow by invoking the CLT conditions. 



Proof of Theorem [3j We have that, for X B denoting the random variable 
with law Qb and A denoting the random variable with law Q, and MSE{flg\X) 
denoting the mean square error E[(fig — fi g ) 2 \X] conditional on the clement A°' B 
drawn according to X = \ B or X = A: 



MSEtfig 



= E QB [MSE(n 9 \\ B )]=E Q 

= E Q [MSE@ g \\)]+E Q 

< E Q [MSEfr g \\)}+4g 2 E Q 

= (al N /N)+8g 2 \\Q B - 



MSE(fi g \\) 



dQs(X) 
dQ(X) 



dQ B {X) 



dQ{\) 



- 1 



TV, 



where a 2 N is N times the variance of the sample average when the Markov 
chain starts from the stationary distribution Q. We also used the fact that 
WQb - Q\\tv = \J \dQ B /dx - dQ/dx\dx. 

The bound on a 2 N will depend on the particular scheme, as discussed below. 
We begin by bounding the burn- in period B. 

We require that the second term in the bound for MSE(jig) to be smaller 
than e/3, which is equivalent to imposing that WQb — Q\\tv < Using the 
conductance theorem of [37] restated in equation (|3.15p , since Qq is M-warm 
with respect to Q, we require that 



'M 



i _ 2_ 

1 2 



< 



Me 



-Bl 



< —r or B > — In 
2Ag 2 4> 2 



2AVMg 



Next we bound a 2 N . Specifically, we determine the number of post-burn- in 



iterations Ni r , N ss , or N, 



needed to set MSE(jlg) < e. 
1. To bound Ni r , note that a 2 N < 7o^ where the last inequality follows 
from the conductance-based covariance bound of [37] restated in equation (|4.29|) . 



Thus, N lr 



7o 6 



and B set above suffice to obtain MSE(j2 g ) < e. 



2. To bound N ss , we first must choose a spacing S to ensure that the auto- 
covariances are sufficiently small. We start by bounding 



<J 2 n N < 7o + 2iV| 7s | < 70 + 2iV 7 o 1 
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where we used the conductance-based covariance bound of [37j restated in equa- 
tion (|4.29[) and that X l,B and \ t+1B are spaced by S steps of the chain. By 
choosing S as 



21 <e~ s - < —, ovS>- 
2 / 670 (p 2 V £ 



s"* 2 £ ^ 2 , / 670 



and using N ss = -12. ^ we obtain 



MSE{%) < ^-( l0 + 2N ss \ ls \) + 8g 2 \\Q B -Q\\Tv 
37o \ £670/ 3g z 



3. To bound N ms , we observe, using that X t,B ,i = 1,2, are i.i.d. across i, 
that MSECp g ) < + e/3 < e provided that N ms > 2j /(3e). m 

Proof of Theorem [4l Given 

K = B(0, \\K\\) where \\K || 2 = cd, 

condition C.l holds by an argument given in proof of Ghosal's Lemma 4. Let 
A„(c) = J~^B ln (0) + ^B 2n (c). Our condition C.2 is satisfied by an argument 
similar to that given in the proof of Ghosal's Lemma 1 with 

ei = O (A„(c)||s|| 2 ) = O p (X n (c)d) = Opid^/n 1 / 2 ) = o p (l) and 

e 2 = O (A„(c)) = O p (d^/n 1 ^ = o p (l/d), 
and our condition C.3 is satisfied since e 2 \\K || 2 = o p (l). ■ 

Comment A.l Ghosal [20] proves his results for the support set K 1 = B(0, C^/d\og 
His arguments actually go through for the support set K = B(0,C\/~d) due to 
the concentration of normal measure under d — > 00 asymptotics. For details, see 

W- 

Proof of Theorem M Take K = B(0, \\K\\), where ||/f|| 2 = Cd\ for some C 

sufficiently large independent of d (see H] for details). Let A„(c) = y^I?i n (0) + 

— B2 n (c). Then condition C.l is satisfied by the argument given in the proof of 
Ghosal's Lemma 4 and NE.3. Further, condition C.2 is satisfied by the argument 
similar to that given in the proof of Ghosal's Lemma 1 and by NE.3 with 

ei = O p (S^d 1 ' 2 + S 2n d + \ n {C)(5 ln d 1/2 + 6 2n d 1/2 + d)) = o p (l), 
e 2 = O p (A»(C)) = o p (d 1/2 /n 1/2 ) = o p (l/d), 
and condition C.3 is satisfied since e2|jif || 2 = o p (l). ■ 
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Comment A. 2 For further details, see J^j. 

Proof of Theorem [6j We will first establish the following linear approximation 
for S n (9) in a neighborhood of 9q 



sup 

9-8o\\<Cy/d/r. 



\S n (6) - S n (8 ) - n x l 2 A{9 - 0„)|| - o v (cT 1 / 2 ) (A.39) 



for any fixed constant C > 0. For notational convenience let 

5 n (9) = S n (6)-S n (Oo)-n 1/2 A(8-9 ), W n {6) = S„(9) - S n {9 ) - E \S n {9) - S n (9 )} . 

(A.40) 

Let T n = {rf(m(X,9) - m(X,9 )) ■ \\9 - O \\ < Pn,rj G S dl }. Under condition 
ZE.l, we apply the following maximal inequality adopted from He and Shao [22] 
(see [S] for details) to an empirical process indexed by members of T n : 

sup In" 1 / 2 yV(Xi)| = O p ( Wy/logn f sup E[f 2 ] + n^VM 2 logn) 
fern S ^ \fe^ J 

(A.41) 

Here the multiplier yV arises as the order of the uniform bracketing entropy 
integral, where V is the VC dimension of a VC function class T n or an entrop- 
ically equivalent class T n - We assumed in ZE.l that V = 0(d). Also M is the 
a.s. bound on the envelope of T n , assumed to be of order 0{\fd). Finally, we 
assumed that sup^-g^ {Elf 2 ]) 1 / 2 = 0(pf l ). Therefore, we have that uniformly 
in 9 G 9„ 



\\W n (6)\\ = O^VdW^ip-evf* + n- 1 dM 2 \ogn) 1 ' 2 
= O p (V<21ogra||0 - 6> ||" + n^^d 3 / 2 logn) . 



(A.42) 



Note that (|A.42j) and an expansion with an integral reminder around 9 — 9o 
shows that uniformly in 9 G 9 n 

< \\W n (8)\\+\\W 2 E[S n (0]-[9-8 Q ,9-8 Q }\\ 
= O p ( d 1 / 2 \og 1/2 n\\8 ~ 8 \\ a +n- 1 / 2 d 3 / 2 \ogn S j + 

+ Op[VdVi\\8-e Q \ 



where £ lies between 9 and 9o and we used ZE.2 that imposes || V 2 i?[S' n (£)] • 
[7 >7] 1 1 = 0(v / dn||7|| 2 )- The condition (| A.39|) follows from the growth condition 
ZE.3(a). 

Building upon (|A.39[) . Lemmas [7] and [5] verify that conditions C.1-C.3 hold 
proving Theorem [6] ■ 

Lemma 7 Under conditions ZE.1-ZE.3, conditions C.2 and C.3 hold for K — 
B(0,CVd) for any fixed constant C > 0. 
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Proof of Lemma[7l Let s = — (A' A) A' S n (6o) be a first order approximation 
for the extremum estimator. For 9 = 9q + (s + X)/y/n and 9 — 9o + s/yfn 

\nl{X) = -||S n (0)|p + ||S„(0)||2 

= -X'A'AX - \\r n \\ 2 - 2r' n A\ - 2r' n S n (9) 
= -X'A'AX + o p (l), 

where r n = 5 n (6) - S n {9) for S n (6) defined in (|A.40p . Indeed, using (|A.39p we 
have \\5 n (6)\\ = o p (d" 1 / 2 ) an d \\5 n (6)\\ = o p (d- 1 / 2 ) uniformly over A e K; 
using (|A~39|l we have \\S n 0)\\ = O p (d^ 2 ); and moreover, ||A|| = 0(d 1 / 2 ), and 
||s|| = O p {d 1 / 2 ) by Chebyshev inequality. Thus, conditions C.2 and C.3 follow 
with ex = o p (l), e 2 = 0, and J = 2 A' A. ■ 

Lemma 8 Under the conditions ZE.l, ZE.2, and ZE.3 there exist a constant 
C > such that by setting K = -6(0, CVd) we have f RC £(X)d\ = o p (J„ £(X)dX) 
and condition C.l holds. 

Proof of Lemma [8j For notational convenience we conduct the proof in the 
original parameter space. Let 9 = 9q + s/^/n and e > be any small positive 
constant. Since ||s|| = O p (d 1 / 2 ), there is a constant C such that ||s|| < Cd 1 / 2 , 
with asymptotic probability no smaller than 1 — s. Below we replace the last 
phrase by "wp 1 — e" . 

Now, since E[S n (9o)] = 0, we have that 

S n (0) = W n (9) + S n (9 ) + E[S n (9)}, (A.43) 

where W n {9) is defined in (jX42|) . 
Next, define for C > 6 + C the sets 

K = B (d , Cy/djn\ CK = b(§, C^fd/n \ , (A.44) 

where the inclusion holds wp 1 — e. Note that these sets are centered on different 
points. We will show that for a sufficiently large constant C 

J exp(-\\S n (9)\\ 2 )d9 = o p ^ex.p(-\\S n (6)f)d6j , 

which implies the claim of the lemma. 

Step 1. Relative bound on ||5 n (^ )||- Note that ||S n (0o)|| = O p (d 1/2 ) by 
Chebyshev inequality. Using equation (|5.32[) of condition ZE.2, we have that 

\\E[S n {9)]\\ 2 > (Vn-(^jl\\9~9 Q \\A5)) 2 > ( CJJiVd^ , V9 e K c 

since 1 1 6* — 6*o 1 1 > Cy/d/n. Therefore, there exists C such that wp 1 — e 

\\E[S n {9)}\\ > 5\\S n (9 a )\\ uniformly in 6 € K c . (A.45) 
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Step 2. Relative bound on \\W n {9)\\. Using equation (|A.42j) . we have that for 
uniformly in 9 £ 9„ C B(0, T n ) 

\\W n (0)\\ = O p ( y/dlogn\\0 - 9 \\ a + n-^ 2 d 3 / 2 logrc) , 

Building on that, we will show that || = o p (y/n(S A \\9 — 9o\\)) uniformly 

on 9 £ K c , and therefore 

||W„(0)|| = o p (\\E[S n (9)}\\), uniformly in 9 £ K c . (A.46) 

For the case that i5 < \\9 — 6o\\ < T n it suffices to have \/d log nT" + 
n~ 1 / 2 d 3 / 2 \ogn = o(n 1 / 2 ). On the other hand, for C^fdfn < \\9 - 9 Q \\ < 5 
it suffices to have y/d log n\\ 9 - 9 \\ a + n^^d 3 ' 2 log n = o{y/n\\0 - 9 \\) . Indeed, 
VdWTi\\9 - 9 Q \\ a = (VH||6» - O ||) if VdWn~ = o(y/E\\9 - 9 Q \\ 1 - a ), which is 
implied by \Jd log n — o(y/n(d/n)^~). Moreover, n~ x l 2 d 3 l 2 logn = o{yJn\\9 — 
9o\\) if n~ x l 2 d 3 l 2 logn = o{yfn^Jd~Jrt). All of the above conditions hold under 
condition ZE.3. 

Step 3. Lower bound on \\S n {9)\\. We will show that 

\\S n {9)\\ 2 = \\E[S n {9)] + S n (9 ) + W n (9)\\ 2 > l\\E[S n (9)}\\ 2 (A.47) 

uniformly for all 9 £ K c wp 1 — 2e. 

For any two vectors a and b, we have \\a + b\\ 2 > (\\a\\ — \\b\\) 2 
2||a||||6|| + ||&|| 2 > ||a|| 2 (1 - 2j|b ||/||a||) Appl ying this rel ation w ith a 
and b = W n {9) + S n (6 ), $E3S§ , and (lA~46ll, we obtain (fA~47| . 

Step 4- Bounding the integrals. Using (|A.47|) and ZE.2 wp 1 — 3e 

J~ e ex P (-||^(0)|| 2 )^ < f~ c eM-\\Sn(9)\\ 2 )d9 
</~ c exp(-i||£;[^(0)]|| 2 )^ 

< f~ c exp(-i/m||0 - h\\ 2 )M + J Sc eM~^nS 2 )d6 

< (2ir) d / 2 (nn)~% P(\\U\\ > C^/J/n) + exp(-A^n<5 2 )vol(e„) 

< (2^) d / 2 (n/*)~* exp f- i^H^t d) + v d T* exp(-±/m<5 2 ) 

where v<i is the volume of the <i-dimensional unit ball, which goes to zero as d 
grows, and U ~ A^O, Id) - In the first line we used the inclusion (|A.44|1 . and in 
the last line we used a standard Gaussian concentration inequality, Proposition 
2.2 in Talagrand [SI], and the fact that E[\\U\\] < {E^UW 2 ]) 1 / 2 = -^y/d/n. 
On the other hand, by Lemma [7] we have 

-\\S n {9)\\ 2 + \\S n (9)\\ 2 = n\\A(6 - 9)\\ 2 + o p (l) 

uniformly for 9 £ K. This yields that wp 1 — e 

J~eM-\\Sn{9)\\ 2 ) d9 > cM-\\Sn(9)\\ 2 ) f~ exp{-n\\A(9 - 9)\\ 2 + o p (l))d9 

> exp(-C 2 d)f^ cxp(-C in \\9 - 9\\ 2 )d9 

> cxp(-C 2 rf)(27r)5(C 1 n)-7(l - P(\\U\\ < Cyfd/n)) 

> exp(-C 2 d)(27r)*(C 1 n)-*(l-o(l)) 



= n«ir - 

E[S n {9)\ 
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where constant G\ is maximal eigenvalue of A' A, constant Ci is such that 
\\Sn(8)\\ 2 < C 2 d wp l-e by LemmaH U ~ N(0, ^h)- In the last line we used 
the standard Gaussian concentration inequality, Proposition 2.2 in Talagrand 
[51], with constant C > 2/y/C{ to get P(||C7|| < Cyfdjn) = o(l). 
Finally, we obtain that wp 1 — be 

fz c ex P (-\\s n (e)\\ 2 )de {2K)i(jm)-i exp (- ((? - 1/ 2 ^ )2 ^ ) +^r n d exp (-i^ 2 ) 

feexp(-\\S n (e)\\ 2 )dO - exp(-C* 2 d) (2n) d / 2 (Cm)- d / 2 (l + o(l)) 

where the right hand side is o(l) by choosing C > sufficiently large, and noting 
that terms (2-K) d l 2 n~ d / 2 cancel and that dlnT„ = o(n) by condition ZE.3. 
Since e > can be set as small as we like, the conclusion follows. ■ 

Appendix B: Bounding log-/3-concave functions 

Lemma 9 Let f : IR — > IR fee a unidimensional log- (3 -concave function. Then 
there exists a logconcave function g : IR — > IR such that 

Pg(x) < f{x) < </(x) /or every x G M. 

Proof. Consider ft,(x) = ln/(x) a (In /3)-concave function. Now, let m be the 
smallest concave function greater than h(x) for every x, that is, 

{fc fc fc ~j 

^ Ai%i) : fc e N, A e H fc , A > 0, ^ A, = 1, A^ = x L 
i=l i=l i=l J 

Recall that the epigraph of a function w is defined as epi w = {(x,t) : t < 
w(x)}. Using our definitions, we have that epi m — conv(epih) (the convex hull 
of epih), where both sets lie in IR 2 . In fact, the values of m are defined only 
by points in the boundary of com? (epih)- Consider (x,m(x)) € epi m , since the 
epigraph is convex and this point is on the boundary, there exists a supporting 
hyperplane H at (x,m(x)). Moreover, (x,m(x)) £ comi(epih n H). Since H is 
one dimensional, (x, m(x)) can be written as convex combination of at most 2 
points of epih. 

Furthermore, by definition of log-/3-concavity, we have that 

lnl//3> sup Xh(y) + (l-X)h(z)-h(Xy + (l-X)z). 

\e[o,i], v ,z 

Thus, h(x) < m(x) < h(x) + ln(l//3). Exponentiating gives f(x) < g(x) < 
■g/(x), where g(x) — e" 1 ^' is a logconcave function. ■ 
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